Crime Prediction in San Fransisco
نویسندگان
چکیده
In June 2015, Kaggle began a competition named “San Francisco Crime Classification”[8], ending in June 2016. The competition’s dataset caught our attention due the subject being very tangible, with crime being at the forefront of modern media and to San Francisco being culturally significant due to its current tech industry. The dataset is also described by geographic and temporal features, therefore enabling potentially interesting visualizations. After our initial investigation of this dataset and the Kaggle competition, we realized that there was a large amount of accessible information on different ways of analyzing this very data set through blog posts and scripts published for this competition on Kaggle and Kaggle’s forums. Through the nature of Kaggle competitions, a means of evaluation is also provided for by the competition rules. We chose this dataset for these reasons. The “San Francisco Crime Classification” competition and its accompanying dataset, provided by SF OpenData, consists of 878,049 samples of crime reports from all neighborhoods of San Francisco spanning from January 2003 to May 2015. The data is initially split by Kaggle into two sets: the training and testing set. Odd numbered weeks (1, 3, 5, 7, . . . ) are put in the training set, and even numbered weeks (2, 4, 6, 8, . . . ) are put in the test set. The fields in each sample point are given in table 1.
منابع مشابه
RoboCop: Crime Classification and Prediction in San Francisco
In this paper, we employ machine learning and other statistical techniques to the problems of classifying and predicting crimes in San Francisco. Drawing upon existing research in the field to approach these two problems, we employ Random Forest and VAR(p) models, respectively. For the classification problem, our results across all 39 crime categories demonstrate the difficulty of the fully-spe...
متن کاملObjective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism Assessment
Fig 2 : Overview of the classifier system. Best estimate clinical diagnosis used as ground truth. Objective Language Feature Analysis in Children with Neurodevelopmental Disorders during Autism Assessment Manoj Kumar, Rahul Gupta, Daniel Bone, Nikolaos Malandrakis, Somer Bishop, Shrikanth Narayanan Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles Department...
متن کاملPeep show establishments, police activity, public place, and time: a study of secondary effects in San Diego, California.
An empirical study was undertaken in San Diego, California, to test assumptions made by the government and by conservative religious policy advocates that there is a greater incidence of crime in the vicinity of peep show establishments. We asked two questions: (a) Is criminal activity in San Diego particularly acute at peep show establishments compared to surrounding control locations? and (b)...
متن کاملCS 229 Project Report: San Francisco Crime Classification
Different machine learning approaches were conceptualized and implemented for predicting the probabilities of crime categories for crimes reported in San Francisco. The crimes records used in the research are downloaded from a competition on Kaggle. A Bayesian model, a mixture of Guassians model (stratified and unstratified), and logistic regression are implemented. A satisfactory result was ac...
متن کاملSan Francisco Crime Classification
San Francisco Crime Classification is an online competition administered by Kaggle Inc. The competition aims at predicting the future crimes based on a given set of geographical and time-based features. In this paper, I achieved a an accuracy that ranks at top %18, as of May 19th, 2016. I will explore the data, and explain in details the tools I used to achieve that result.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015